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BACKGROUND OF THE INVENTION 
Field of the Invention 

[0002] This invention relates generally to the field of data processing and, 

more particularly, to the automated analysis and mining of concepts from 
unstructured data. 

Related Art 

[0003] Structured data or objects generally refer to data existing in an 

organized form, such as a relational database, that can be accessed and 
analyzed by conventional techniques (i.e. Standard Query Language, SQL). 
By contrast, so-called unstructured data or objects refer to objects in a textual 
format (i.e. faxes, e-mails, documents, voice converted to text) that do not 
necessarily share a common organization. Unstructured information often 
remains hidden and un-leveraged by an organization primarily because it is 
hard to access the right information at the right time or to integrate, analyze, or 



compare multiple items of information as a result of their unstructured nature. 
There exists a need for a system and method to provide structure for 
unstructured information such that the unstructured objects can be accessed 
with powerful conventional tools (such as, for example, SQL, or other 
information query and/or analysis tools) and analyzed for hidden trends and 
patterns across a corpus of unstructured objects. 
[0004] Conventional systems and methods for accessing unstructured objects 

have focused on tactical searches that seek to match keywords. These 
convention systems and methods have several shortcomings. For example, 
assume a tactical search engine accepts search text. For purposes of 
illustration, suppose information about insects is desired and the user-entered 
search text is 'bug'. The search engine scans available unstructured objects, 
including individual objects. In this example, one unstructured object 
concerns the Volkswagen bug, one is about insects at night, one is about 
creepy-crawlies, one is about software bugs, and one is about garden bugs. 
The tactical search engine performs keyword matching, looking for the search 
text to appear in at least one of the unstructured objects. In this 'bug' 
example, only those objects about the Volkswagen bug, software bugs, and 
garden bugs actually contain the word 'bug' and will be returned. The objects 
about insects at night, and creepy-crawlies may have been relevant to the 
search but unfortunately were not identified by the conventional tactical search 
engine. 

[0005] One conventional method of addressing this problem allows a user to 

enter detailed searches utilizing phrases or Boolean logic, but successful 
detailed tactical searches can be extremely difficult to formulate. The user 
must be sophisticated enough to express their search criteria in terms of 
Boolean logic. Furthermore, the user needs to know precisely what he or she 
is searching for, in the exact language that they expect to find it. Thus, there is 
a need for a search mechanism to more easily locate documents or other 
objects of interest, preferably searching with the user's own vocabulary. 
Further, such a mechanism should desirably enable automatically searching 



related words and phrases, without knowledge of advanced searching 
techniques. 

[0006] In another conventional method, the search is done based on meaning, 

where each of the words or phrases typed is semantically analyzed, as if 
second guessing the user (for example, use of the term Juvenile picks up 
teenager). This increases the result set and thus makes analysis of search 
results even more important. Also, this technique can be inadequate and quite 
inaccurate when the user is looking for a concept like "definition of terrorism" 
or "definition of knowledge management," where the "concept" of the phrase 
is more important than the meaning of the individual words in the search term. 
[0007] Even when tactical searches succeed in searching or finding 

information, the problem of analyzing unstructured information still remains. 
Analyzing unstructured information goes beyond the ability to locate 
information of interest. Analysis of unstructured information would allow a 
user to identify trends in unstructured objects as well as to quickly identify the 
meaning of an unstructured object, without first having to read or review the 
entire document. Thus, there further exists a need to provide a system and 
methodology for analyzing unstructured information. 
[0008] Prior art classification systems exist that can organize unstructured 

objects in a hierarchical manner. However, utilizing these classification 
systems to locate an object of interest requires knowing what the high-level of 
interest would be, and following one path of inquiry often precludes looking at 
other options. 

[0009] Some prior art technologies store data and information utilizing 

proprietary methods and/or data structures. This prevents widespread or open 
access or analysis by keeping objects in a native non-standard proprietary 
format. Thus, there is a need to store captured information about unstructured 
objects in an open architecture and preferably in a readily accessible standard 
storage format. 
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[0010] This present invention provides a system and method for transforming 

an initial set of unstructured and/or structured information objects into a 
knowledge discovery platform where actionable intelligence is elucidated and 
further discovery is made possible.. Generally, the present invention provides 
the ability to link both structured and unstructured information for analysis in 
order to define new business rules and methods. The complex interactions of 
an organization at all levels with internal and external clients may be 
encapsulated for analysis using the methods of the present invention. The 
integration of all available information sources and business stakeholders 
results in a more comprehensive analysis of the information sources available 
to the organization; thus, enhancing decision making. A stakeholder is an 
entity that interacts with an organization. Stakeholders include people internal 
and external to the organization as well as electronic devices interacting with 
the organization. 

[0011] The present invention transforms currently available unstructured or 

structured data into a knowledge discovery platform. For example, an 
important ingredient in capturing the essential information needs of an 
organization is ongoing feedback received from multiple stakeholders. The 
ongoing feedback refines concepts leading to improved analysis and output. 
The knowledge discovery component reveals information gaps that need to be 
filled as the organization evolves. These continual completions and 
refinements at multiple points using unbiased integrated structured and 
unstructured data analytics to reveal information gaps in the method lead to a 
positive cycle of enhancements. 

[0012] More specifically, the present invention provides a system and method 

for transforming an initial set of unstructured and/or structured information 
objects into a knowledge discovery platform for actionable intelligence. 
Furthermore, this knowledge discovery platform provides the architecture for 
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discovering and refining current and novel information by synchronizing 
information needs with information collection and analysis using integrated 
unstructured and structured knowledge discovery analytics. 
[0013] The method of the invention includes identifying an electronic path to 

at least one object for inclusion in an initial set of objects. This object can be, 
for example, an electronic file from one or more databases, text, graphic, 
voice, tactile or taste formats. The method uses at least one 
application/algorithm to extract at least one concept in these various formats to 
form the initial set of objects to create an initial set of concepts. Relationships 
among these concepts may be determined, verified and refined using 
references such as thesauri, dictionaries or other industry specific references 
and by then applying standard natural language processing techniques. 
[0014] A thorough understanding of a current set of initial concepts is derived 

using multidimensional analysis. This analysis permits all of the stakeholders 
to define the boundary of their information needs. Furthermore, 
multidimensional analysis may discover at least one additional concept to 
create a second set of concepts. The addition of this discovered concept alters 
the information needs boundary. The multidimensional analysis and discovery 
process is then repeated until no additional useful concepts either within or 
outside of the organization can be found. The method optionally deletes 
concepts based on multidimensional analysis and discovery. 
[0015] The perpetual, cyclical feedback of multiple stakeholders interacting to 

refine concepts through multidimensional analysis and discovery redefines the 
information needs boundary leading to new and relevant information 
collection (and optional deletion) to converge on a dynamically changing 
information boundary as the interactions of the various stakeholders evolve 
within and outside the organization. It is this comprehensive and complete 
information collection that permits a comprehensive and complete analysis 
and output to fulfill the information needs of all stakeholders at all levels both 
within and outside an organization. 
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[0016] Further embodiments, features, and advantages of the present 

invention, as well as the structure and operation of the various embodiments of 
the present invention, are described below with reference to the accompanying 
drawings. 

[0017] An advantage of the present invention is that it provides a system and 

method for tracking and optionally reporting the changing presence of words 
or phrases in a set of documents over time. 
[0018] Another advantage of the invention is that it provides a system and 

method that can recognize relevant relationships between words and concepts, 
and can identify an object under more than one level of interest. The present 
invention scans objects for words or phrases and determines the presence of 
certain patterns that suggest the meaning or theme of a document, allowing for 
more accurate classification and retrieval. 
[0019] Yet another advantage of the present invention is that it provides a 

relational database as a storage format, of which many types are known. 
Storage in a relational database keeps the information readily available for 
analysis by common tools. Where access protection is desired, various known 
security measures may be employed, as are known in the art. The present 
invention provides a theme or concept-based method and system to analyze, 
categorize and query unstructured information. 

BRIEF DESCRIPTION OF THE FIGURES 

[0020] These and other features of the invention are more fully described 

below in the detailed description and accompanying drawings. 
[0021] FIG. 1 is a flowchart showing the high level operation of the invention 

according to an embodiment. 
[0022] FIG. 2 is a flowchart showing the operation of deleting concepts 

according to an embodiment of the present invention. 
[0023] FIG. 3 is a flowchart showing the process of extracting concepts 

according to an embodiment of the present invention. 



[0024] FIG. 4 is a flowchart showing the process of refining concepts 

according to an embodiment of the present invention. 
[0025] FIG. 5 is a flowchart showing the process of refining concepts 

according to another embodiment of the present invention. 
[0026] FIG. 6 is a flowchart showing the process of refining concepts 

according to another embodiment of the present invention. 
[0027] FIGs. 7A and 7B are a flowchart showing the process of performing 

multi-dimensional analysis on the concepts according to an embodiment of the 

present invention. 

[0028] FIG. 8 is a flowchart showing the process of generating reports and 

presenting analysis according to an embodiment of the present invention. 
[0029] FIG. 9 is a flowchart showing the process of storing and sharing 

concepts according to an embodiment of the present invention. 
[0030] FIG. 10 is a flowchart showing the process of creating business rules 

according to an embodiment of the present invention. 
[0031] FIGs. 1 1-23 are screen shots of graphical user interfaces utilized by the 

present invention according to an example embodiment. 
[0032] FIG. 24 illustrates data visualization according to an embodiment of 

the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 

[0033] An embodiment of the present invention is now described with 

reference to the figures, where like reference numbers indicate identical or 
functionally similar elements. Also in the figures, the left-most digit of each 
reference number corresponds to the figure in which the reference number is 
first used. While specific configurations and arrangements are discussed, it 
should be understood that this is done for illustrative purposes only. A person 
skilled in the relevant art will recognize that other configurations and 
arrangements can be used without departing from the spirit and scope of the 
invention. It will be apparent to a person skilled in the relevant art that this 
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invention can also be employed in a variety of other devices and applications 
such as, but not limited to, financial services, wireless telecommunication 
services, insurance services, high technology, manufacturing, retail, and 
consumer products. 
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I. Invention Overview 

[0034] FIG. 1 is a flowchart showing the high level operation of the invention 

according to an example embodiment. The method starts at step 102 where 
control transfers to step 104. In step 104, at least one object is selected for 
inclusion in an initial set of objects to be analyzed. In an embodiment of the 
invention, an object is a source of information, such as a textual document, an 
email, a web page, a spreadsheet, or any other container (or containers) of 
information or concepts that may or may not be formatted. Control then 
transfers to step 106. 

[0035] In step 106, at least one application/algorithm is used to extract the at 

least one concept from the initial set of objects to create an initial set of 
concepts. Step 106 is further described with reference to FIG. 3 below. 
Control then passes to step 108. In step 108, the at least one concept is refined 
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based on its relationships to other concepts. Step 108 is further described with 
reference to FIGs. 4-6 below. Control then passes to step 1 10. 
[0036] The terms "application" and "algorithm" are used herein to refer to a 

method or mathematical algorithm, typically implemented in computer 
software as a series of logical steps, that performs some function. These 
functions are generally associated herein with identifying concepts in objects. 
Example functions include performing speech recognition, identifying features 
in a graphical image, doing word look-ups in a dictionary or thesaurus, 
discovering embedded relationships in the words or phrases and the like. 
[0037] In step 1 10, multi-dimensional analysis is performed on the at least one 

concept present in the initial set of objects. Step 110 is further described with 
reference to FIGs. 7 A and 7B below. Control then passes to step 112. In step 
112, based on step 110, it is determined (i.e., discovered) whether at least one 
additional concept exists. Control then passes to step 114. In step 114, if an 
additional concept exists, then control passes back to step 106 for creation of a 
second set of concepts. Otherwise, control passes to step 1 16. 
[0038] In step 116, it is determined (i.e., discovered) whether at least one 

additional object exists outside the initial set of objects. Here, a second set of 
objects will be created that includes the at least one additional object and the 
objects in the initial set of objects. Control then passes to step 118. In step 
118, if one additional object exists then control passes back to step 106.. 
Otherwise, control passes to step 120 where the flowchart in FIG. 1 ends. 
[0039] As described above in step 110 of FIG. 1, multi-dimensional analysis 

is performed on the at least one concept present in the initial set of objects. 
FIG. 8 is a flowchart that starts from step 110 and shows the process of 
generating reports and presenting analysis according to an embodiment of the 
present invention. As illustrated in FIG. 8, control passes from step 110 to 
step 802. In step 802, reports are generated based on the multi-dimensional 
analysis. Control passes then to step 804. In step 804, the analysis is 
presented in a graphical or visual format. Control then passes to step 808 
where the flowchart in FIG. 8 ends. 
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[0040] As described above, step 1 12 of FIG. 1 determines whether at least one 

additional concept exists to create a second set of concepts. FIG. 9 is a 
flowchart that starts from step 112 and shows the process of storing and 
sharing concepts according to an embodiment of the present invention. As 
illustrated in FIG. 9, control passes from step 112 to step 902. In step 902, the 
at least one concept is stored in a concept repository. Control then passes to 
step 904. In step 904, the stored concepts are shared with other users. Control 
then passes to step 906 where the flowchart in FIG. 9 ends. 
[0041] Another embodiment of the present invention involving deletion of 

additional concepts is shown in the flowchart of FIG. 2. Referring to FIG. 2, 
control passes from step 110 of FIG. 1 to step 202. In step 202, based on step 
110, it is determined whether at least one concept needs to be deleted from the 
initial set of concepts. Control then passes to step 204. In step 204, if at least 
one concept needs to be deleted, then the concept is deleted from the initial set 
of concepts to create a second set of concepts. Control then passes to step 
206. 

[0042] In step 206, if the at least one additional concept needs to be deleted, 

then control passes back to step 106 in FIG. 1. Otherwise, control passes to 
step 208. In step 208, it is determined whether the at least one additional 
object exists outside the initial set of objects. Here, a second set of objects 
will be created that includes the at least one additional object and the objects 
in the initial set of objects. Control then passes to step 210. In step 210, if at 

least one additional object exists, then control passes back to step 106 in FIG. 

1 to include the additional object in the initial set of objects to create a second 

set of objects. Otherwise, control passes to step 212 where the flowchart in 

FIG. 2 ends. 

[0043] FIG. 10 illustrates an additional step for the flowchart in FIG. 2. FIG. 

10 is a flowchart showing the process of creating business rules according to 
an embodiment of the present invention. From step 210 of FIG. 2, control 
passes to step 1002 in FIG. 10. In step 1002, business rules are created to be 
used in transformation of data into a database. Control then passes to step 
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1004 where the flowchart in FIG. 10 ends. Different types of objects are 
described next. 

H. Object Types: Structured, Unstructured and Semi-Structured 

[0044] As described above, an object may be a source of information, such as, 

for example, a single textual document, an email, a web page, a spreadsheet, 
or any other container (or containers) of information or concepts that may or 
may not be formatted. Objects may be classified as three different types 
including structured, unstructured and semi-structured types. 
[0045] In an embodiment of the present invention, unstructured data is a 

collection of free form textual information that may or may not be formatted. 
This includes, but is not limited to, emails, web pages, documents, 
spreadsheets, and text columns in any type of database. 
[0046] In an embodiment of the present invention, structured data is a 

collection of preclassified and presorted objects that have defined and usually 
unambiguous relationships to other data in the structured data collection. 
These objects are usually stored in databases such as relational databases of 
the type, for example, made by Oracle Corporation of Redwood City, 
California or Microsoft Corporation of Redmond, Washington. 
[0047] In an embodiment of the present invention, semi-structured data is 

either: (1) structured data containing unstructured information such as text 
columns in a structured data column to capture user comments (At some level 
these comment columns have a defined relationship to all other data objects. 
However, an analysis of the contents in this comments column may go 
through natural language processing techniques to yield relevant and 
actionable outputs.); or (2) unstructured data may have structured components 
embedded within it such as tables inside a Microsoft Word document or a 
largely unstructured object containing some structured components, such as 
the "To", "From", and "Subject" fields of an email. 
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III. Objects Comprised of Other Objects 

[0048] Objects of the present invention may be comprised of other objects. 

For example, a corpus may be defined as a collection of objects. The 
integration of all object types in all domains within varying levels of 
unstructured and structured components is through the extraction of concepts. 
An example of linking structured and unstructured components for textual data 
is to rename the unstructured file with a relevant primary key id (or a 
combination of relevant keys/ids) of its corresponding structured component 
in the database. How the present invention extracts concepts is described next. 

IV. Extracting Concepts 

[0049] As described above in step 106 of FIG. 1, at least one 

application/algorithm is used to extract the at least one concept from the initial 
set of objects to create an initial set of concepts. Step 106 is further described 
with reference to the flowchart in FIG. 3. FIG. 3 illustrates extraction of 
concepts from a variety of different object types. While the figure is presented 
in flowchart format, this is done only for convenience of explanation. The 
illustrated steps may be done in parallel or in any order. Furthermore, some 
steps may be omitted and/or other steps added depending on the object types 
(e.g., textual, graphical, human tactile or other sensory objects) that are present 
for processing. 

[0050] The flowchart in FIG. 3 starts at step 302 where control passes to step 

304. In step 304, an application programming interface (API) is used to obtain 
a common format of the at least one object. Control then passes to step 306. 
In step 306, an extraction application/algorithm is used to obtain a common 
format of the at least one object. There are commercially available extraction 
algorithms that operate on different domains that may be used to obtain 
concepts in a common and analyzable format. Control then passes to step 308. 



- 13- 



[0051] In step 308, an application is used to determine image intensity of the 

at least one object. Although many other properties of an image may be 
determined, a common first step to concept extraction usually includes 
determining the pixel intensity in an image. The properties of a pixel intensity 
includes color. An example application that can be used to determine image 
intensity is Adobe Photoshop 7, available from Adobe Systems, Inc., San Jose, 
California. Control then passes to step 310. 
[0052] In step 310, an application is used to determine at least one boundary 

(an atomic entity) within the at least one object. From this atomic entity, other 
features of an image may be determined that may lead to concepts such as 
boundaries among objects within the image as well as their identities. An 
example application that can be used to determine at least one boundary 
isMaskWarrior vl.O , available from Imagiam High Image Techs, 
SL,_Barcelona, Spain. Control then passes to step 312. 
[0053] In step 312, an application is used to map audio waveforms within the 

at least one object to a text format. There are commercially available 
applications to record voices and transcribe them to text files for concept 
extraction. An example application that can be used to transcribe recorded 
voice information is AudioMining & XML Speech Indexing, available from 
Scansoft, Inc., Peabody, Massachusetts. Control then passes to step 314. 
[0054] In step 314, an application is used to convert non-textual information 

within the at least one object into text. An example application for tactile 
objects that can be used to convert non-textual information into text is 
ConTacts Discrete Tactile Sensors, available from Pressure Profile Systems, 
Inc. of Los Angeles, California. An example application for olfactory and/or 
taste objects that can be used to convert non-textual information into text is 
AROMATRAX®, available from Microanalyses of Round Rock, Texas. 
Control then passes to step 316 where the flowchart in FIG. 3 ends. 
[0055] A common theme to each of the applications discussed with respect to 

FIG. 3 is that the application input may or may not be textual, but the 
application outputs are in a textual format. Other technologies are available to 
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record objects beyond human perception including, for example, an infra-red 
optoelectronics temperature sensor having temperature limits to 1 100°C such 
as FiberView 12200 Series from the Williamson Corporation of Concord, 
Massachusetts. How the present invention refines concepts is described next. 

V. Refining Concepts 

[0056] As described above in step 108 of FIG. 1, at least one concept is 

refined based on its relationships to other concepts. FIGs. 4-6 each further 
described step 108. 

[0057] FIG. 4 starts at step 402 where control passes to step 404. In step 404, 

the relationship of the at least one concept to another concept within the initial 
set of objects is determined. Control then passes to step 406 where the 
flowchart in FIG. 4 ends. 

[0058] FIG. 5 starts at step 502 where control passes to step 504. In step 504, 

the relationship of the at least one concept to another concept outside the 
initial set of objects is determined. Control then passes to step 506 where the 
flowchart in FIG. 5 ends. 

[0059] FIG. 6 illustrates other embodiments of refining a concept based on its 

relationship to other concepts. While the figure is presented in flowchart 
format, this is done only for convenience of explanation. The illustrated steps 
may be done in parallel or in any order. Furthermore, some steps may be 
omitted and/or other steps added. FIG. 6 starts at step 602 where control 
passes to step 604. In step 604, a relationship of the at least one concept to 
another concept within an existing reference is determined. Here, the existing 
reference may be, for example, an English thesaurus, an English dictionary, a 
non-English thesaurus, a non-English dictionary, a domain specific thesaurus, 
a domain specific dictionary, etc. Control then passes to step 606. 
[0060] In step 606, a relationship of the at least one concept to another 

concept using a natural language processing (NLP) algorithm is determined. 
Control then passes to step 608. In step 608 a frequency of occurrence of the 
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at least one concept within the initial set of objects is determined. Control 
then passes to step 610. In step 610, a frequency of occurrence of the at least 
one concept outside the initial set of objects is determined. Control then 
passes to step 612. 

[0061] In step 612, a normalized frequency of occurrence of the at least one 

concept within the initial set of objects is determined. Control then passes to 
step 614. In step 614, a normalized frequency of occurrence of the at least one 
concept outside the initial set of objects is determined. Control then passes to 
step 616 where the flowchart in FIG. 6 ends. The multi-dimensional analysis 
of the present invention is described next. 

VI. Multi-Dimensional Analysis 

[0062] As described above in step 110 of FIG. 1, multi-dimensional analysis 

is performed on the at least one concept present in the initial set of objects. 
This step includes slicing-and-dicing across at least one dimension of the 
initial set of objects. Examples of the one dimension include but are not 
limited to a time dimension, a geographical location dimension, an electronic 
location dimension, a person dimension, a multiple-person dimension, a 
business unit dimension, an organization dimension, a process dimension, a 
product dimension, a service dimension, a subject dimension, a category 
dimension, a concept dimension, a concept type dimension, a user viewpoint 
dimension, and an entity dimension in a structured database. 
[0063] Step 110 is further described next with reference to FIGs. 7 A and 7B. 

FIGs. 7A and 7B illustrate steps of performing multi-dimensional analysis in 
accordance with the present invention. While the figure is presented in 
flowchart format, this is done only for convenience of explanation. The 
illustrated steps may be done in parallel or in any order. Furthermore, some 
steps may be omitted and/or other steps added. Multi-dimensional analysis 
(also called on-line analytical processing or OLAP) generally involves drill 
down, slice and dice and graphical analysis. In drill down, for example, a user 



- 16- 



can explore a dimension hierarchically, moving from summary-level 
information to detailed information and back, to gain fast answers to critical 
business questions. In slice and dice, for example, a user can interactively 
explore corporate data in any combination of dimensions, from different 
angles or perspectives. In graphical analysis, for example, a user can choose 
from a variety of graphical displays-crosstabs, pie charts and a variety of bar 
charts-to visualize key factors that are driving a business. An embodiment of 
multi-dimensional analysis is described in further detail in U.S. Patent Appl. 
No. 10/393,677, filed March 19, 2003, which is incorporated herein by 
reference as if reproduced in full below. 
[0064] The flowchart in FIG. 7A starts at step 702 where control passes to 

step 704. In step 704, the number of objects within the initial set of objects is 
determined. Control then passes to step 706. In step 706, a frequency of 
occurrence of the at least one concept within the initial set of objects is 
determined. Control then passes to step 708. In step 708, a frequency of 
occurrence of the at least one concept within a subset of the initial set of 
objects is determined. Control then passes to step 710. In step 710, a 
frequency of occurrence of the at least one concept within a set outside of the 
initial set of objects is determined. Control then passes to step 712. 
[0065] In step 712, a normalized frequency of occurrence of the at least one 

concept within the initial set of objects is determined. Control then passes to 
step 714 of FIG. 7B. In step 714, a normalized frequency of occurrence of the 
at least one concept within a subset of the initial set of objects is determined. 
Control then passes to step 716. In step 716, a normalized frequency of 
occurrence of the at least one concept outside of the initial set of objects is 
determined. Control then passes to step 718. 
[0066] In step 718, an electronic path to the location of the at least one object 

is determined. Control then passes to step 720. In step 720, at least one 
characteristic of the at least one object is determined. Control then passes to 
step 722. In step 722, at least one concept type for the at least one concept 
within the initial set of objects is determined. Control then passes to step 724. 
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In step 724, a number of objects which contain a set of concepts within the 
initial set of objects is determined. Control then passes to step 726. 
[0067] In step 726, a number of objects which contain a set of concepts within 

a set outside of the initial set of objects is determined. Control then passes to 
step 728. In step 728, a definition for the at least one concept is determined. 
Control then passes to step 730. In step 730, a position of the at least one 
concept within each object containing the at least one concept is determined. 
Control then passes to step 732 where the flowchart of FIG. 7B ends. 

VII. Working Example of the Present Invention 

[0068] A working example of the present invention is described next. This 

working example is provided to facilitate the understanding of the present 
invention and is not meant to limit the scope of the invention. 

[0069] Assume a Company XYZ manages credit card services for five 

national retail chains: Retailer A, Retailer B, Retailer C, Retailer D, and 
Retailer E. As part of this service, XYZ runs customer call centers that 
receive calls from customers, answer questions, and provide other services. 
Customer service representatives record the substance of each customer call. 
The customer call records are then stored as free-form text (a/k/a 
"unstructured data") in a column in XYZ's customer relationship management 
database, which also tracks other information related to each call. 

[0070] Assume XYZ wants to analyze 100,000 customer call records to find 

ways to improve its business processes. Previously, XYZ analysts had to 
manually read the records from a randomly selected sample of the calls. 
However, because of the tremendous volume of calls that its call centers 
received, the number of calls the group of analysts could read was statistically 
insignificant. XYZ provides the network address of these text file calls on 
their isolated local computer drive, a local area network (LAN) or a wide area 
network (WAN). Furthermore, the location of a related database may be 
provided as being located on a computer disk drive, a LAN, or a WAN. 
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[0071] Electronic files stored in the database may be one or more of the 

following types: Program File (*. exe), Text File (*.txt, * pm, *.csv), Word 
Document (*.doc), Rich Text Format (*.rtf), Windows Write (*.wri), Word for 
Macintosh (*.mcw), MS-DOS Text with Layout (*.asc), Text with Layout 
(*.ans), E-mails (*.eml), Outlook Address Book (*.olk), Personal Address 
Book (*.pab), WordPerfect file (*.wpd), Schedule+ Contact (*.scd), 
Powerpoint (*.ppt), Harvard Graphics Show (*.sh3), Harvard Graphics Chart 
(*.ch3), Freelance Windows file (*.pre), Excel File (*.xl*), Adobe Acrobat 
File (*.pdf), Web Page (*.htm*, *.asp, *.jsp), Query File (* *qy), Lotus 1-2-3 
File (*. wk*), Quattro Pro/Dos File (*.wql), Microsoft Works File (*.wks), 
Works for Window (*.wps), Microsoft Access Files (*.mdb), Dbase Files 
(*.dbf), SYLK Files (*.slk), Data Interchange Format File (*.dif), Backup File 
(*.bak), Quattro Pro 1.0/5.0 (win) (*.wbl), Text Recovered from any File 
(*.*), Graphic Interchange Format (*.gif)> Windows Bitmap (*.bmp), JPEG 
file interchange format (*.jpg), Tag image file format (*.tif), portable network 
graphics (*.png), Kodac Photo CD (*.pcd), PC Paintbrush (*.pcx), Raster file 
(*. ras), Audio File (*.wav, *.snd, *.aif, *.aifc, *.aiff, *.wma, *.mp3), CD 
Audio Track (*.cda), Media Playlist (*.asx, *.wax, *.m3u, *.wvx), MIDI File 
(*.mid, *.rmi, *.midi), Movie File (*.mpeg, *.mpg, *.mlv, *.mp2, *.mpa, 
*.mpe), Video File(*.avi, *.wmv), Windows Media File (*.asf, *.wm, *.wma, 
*.wmv), and Tactile Sensing File in ASCII, Lab View, or MATLAB formats. 
[0072] These electronic file formats derived from other applications may use 

known transformation functions to extract concepts. For example, text based 
formats may use natural language processing and industry or standard 
references such as thesauri and dictionaries. Graphics based formats may use 
image segmentation and classification application/algorithms for concept 
extraction. Pressure, temperature and other tactile physical sensations such as 
roughness, smoothness and stickiness are reducible to electronic recordings 
and can be abstracted as concepts. Voice recordings may also be abstracted as 
concepts. Olfaction sensor arrays produce recordings that may also be 
abstracted as concepts. Biochemical assays to determine taste concepts such 
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as sweet, bitter, sour, salty and other differentially identified chemosensory 
stimuli may also be abstracted as concepts. 
[0073] It is important to note that these abstracted concepts from all of these 

varying human perceptions may be stored in textual format, but it is not 
required. There are some concepts that are imperceptible to the human 
perception such as gamma rays in the electromagnetic spectrum. Furthermore, 
it may be important to integrate multiple perceptions simultaneously to fully 
characterize and remove ambiguity from information such as voice intonation, 
facial expression and text containing emotion (such as laughter and sarcasm). 
Consider, for example, the meaning of the sarcastically made statement "You 
must be a genius!". Known applications/algorithms as previously indicated 
may extract concepts from all these perceptions beyond and/or within the 
boundaries of human perception. 
[0074] Although the objects and resulting concepts of the present invention 

may be of any electronic format, the example described herein reduces 
concepts to textual format for analysis by a natural language processing 
algorithm. If the common format is a graphical format, then standard 
segmentation and classification image processing applications/algorithms 
apply. This rule similarly applies for other format domains. 
[0075] The concepts may be refined based upon their relationships to other 

concepts (step 108 of FIG. 1). For example, using natural language processing 
software, XYZ extracts all of the words within its customer call records. The 
software automatically ignores commonly-used stop words, such as: "the", 
"if, "and", "but", "or", etc. Assume that the words extracted are as follows: 

explained, explnd, xplnd, explanation, explain, xpln, expln; 

educate, educ, educat, edcate, educt, edu; 

reward, rewards, rwrds, rwrd, rewrd, rewrds; 

close, els, clos; 

account, acct, accnt, acount, acnt. 
[0076] The reason for the unfamiliar words in the call records is that the 

customer service representatives often use a form of short-hand to record the 
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calls. Using an internal company thesaurus or an industry domain expert, 
XYZ creates the following concepts (step 404 of FIG. 4 and/or step 604 of 
FIG. 6): 

Explain = "explained" or "explnd" or "xplnd" or "explanation" 
or "explain" or "xpln" or "expln" (i.e. whenever one of these words 
appear, the software will recognize an occurrence of the concept 
"Explain"); 

Educate = "educate" or "educ" or "educat" or "edcate" or "educt" 
or "edu"; 

Reward_Points = "reward" or "rewards" or "rwrds" or "rwrd" or 
"rewrd" or "rewrds"; 

Close = "close" or "els" or "clos"; and 

Account = "account" or "acct" or "accnt" or "account" or "acnt". 

[0077] The frequency of occurrence of individual child concepts are counted 

and totaled for its parent concept within the initial set of objects (step 608 of 
FIG. 6). These frequencies of occurrence within the initial set of concepts 
may also be normalized by document count, hit count, or other standard 
natural language processing normalization procedures (step 612 of FIG. 6). 

[0078] XYZ then uses an English-language thesaurus to discover that Explain 

and Educate are synonyms (step 504 of FIG. 5 and/or step 606 of FIG. 6). 
Thus, XYZ modifies the definition of Explain so that Educate becomes a 
child-concept of Explain (similarly, Explain becomes the parent-concept of 
Educate). That is to say that Explain is now defined as follows: 

Explain = "explained" or "explnd" or "xplnd" or "explanation" or 
"explain" or "xpln" or "expln" or "educate" or "educ" or "educat" or 
"edcate" or "educt" or "edu". 
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[0079] The frequency of occurrence of individual child concepts from this 

combined internal and external reference are counted and totaled for its parent 
concept within and outside the initial set of objects (step 610 of FIG. 6). 
These frequencies of occurrence outside the initial set of concepts may also be 
normalized by document count, hit count, or other standard natural language 
processing normalization procedures (step 614 of FIG. 6). 

[0080] It is the interaction of multiple stakeholders that generate this list of 

child concepts for a particular parent concept both within and outside of the 
organization. A multi-user environment is important to maintain this dynamic 
list as words may be deleted or added to any parent concept with time and 
events. For example, a new service representative may use "xpn" as a 
shorthand for "explain," or a a representative whose employment is terminated 
may have been the only one who shorthands "explain" with "xpln" (step 202 
of FIG. 2). If a concept is deleted, then a second set of concepts is created 
from the initial set of concepts because relationships among concepts may 
change as a result of this deletion (step 204 of FIG. 2). The multi -dimensional 
analysis may be repeated to determine what other concepts may be deleted 
(step 206 to step 106 in FIGs. 1 and 2). 

[0081] Furthermore, the business may evolve to have a product named 

"XPLND," so further child concept refinement is required to separate 
documents that refer to the product "XPLND" versus the shorthand for explain 
"xplnd." A deletion of a child concept may occur for one parent concept, 
while a simultaneous creation of a new parent and child concept combination 
is created (step 208 of FIG. 2). Multi-dimensional analysis may be repeated 
until the information void is filled as a result of the concept deletions (step 210 
to step 106 of FIGs. 1 and 2). 
[0082] The method discussed thus far in this example confirms information 

already known by stakeholders. The revelation of the requirement for novel 
business processes, however, requires performing multi-dimensional analysis 
on at least one concept present in the initial set of objects (step 1 12 of FIG. 1). 
An example of multi-dimensional analysis is XYZ performing multi- 
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dimensional analysis on the call records while focusing on the newly defined 
concepts, as described below: 

• First, XYZ drilled-down on several occurrences of each of the 
concepts in order to view the context and verify that the concept 
was correctly identified. 

• Second, XYZ performed relationship discovery on the call records 
and discovered that twenty percent (20%) of the time that Explain 
occurred, the concept Reward_Points occurred within two (2) 
words of Explain. Thus, XYZ created a new concept called 
Explain_Reward_Points, which occurred whenever 
Reward_Points occurred within two (2) words of Explain. 

• Third, XYZ performed relationship discovery on the call records 
and discovered that ninety percent (90%) of the time that Close 
occurred, the concept Account occurred within two (2) words of 
Close. Thus, XYZ created a new concept called Close_Account, 
which occurred whenever Close occurred within two (2) words of 
Account. 

[0083] At this point, XYZ develops a hypothesis that customer calls seeking 

an explanation of the Reward Points system could be eliminated if the 
explanation were provided on their website or as an automatic option on their 
interactive voice response (IVR) system. This would save XYZ money 
because the number of calls that required human interaction would be reduced 
and they could hire fewer customer service representatives. 

[0084] XYZ again performs multi-dimensional analysis on the call records 

(this time focusing on the newly defined concept Explain_Reward_Points). 
Using summarization, XYZ concludes that Explain_Reward_Points occurred 
in five percent (5%) of the call records analyzed, or 5,000 call records. 

[0085] However, XYZ also realizes that customers call for multiple reasons. 

Thus, call records containing Explain_Reward_Points may not necessarily 
be eliminated by offering an explanation if the customer also called for 



-23- 



another reason (to Cose his or her account for example). Th,s example 
involves an tncreasingiy complex and expanding se, of objects, along w,th 
their resutang concepts and nested eoncep. re.ationships. However, 
discovered concepts may also be single objects. 
,0086) To that end. XYZ again performs muUi-dimensional analysis on the 

can records (this time focusing on the newiy defined concepts 
Explain Reward.Poinfc and C.ose_Accou„«). Ustng relationship discovery, 
XYZ dtscovers that ten percent (10*) of the time that 
Exptain RewardJ-oints occurred. Cse.Account occurred within the same 
caU record. This time. XYZ creates a new concept called 
Explain Reward,Poin te .w/o_C.o S e_Aec»«nt, which occurs whenever 
ExpiataCReward.Points occurs and do^Aceoun. DID NOT occur withtn 
the same document. 

r00871 XYZ once again performs multi-dimensional analysis on the call 

* cint, nn the newly defined concept 
records (this time focusing on me new y 

Explain Reward_Points_w/o_Close_Account). Using summarization, XYZ 
concluded that Exp lain_Reward_Points_w/o_Close_Account occurred in 
four and one-half percent (4.5%) of the call records analyzed, or 4,500 call 

records. , 
[0088] Next, XYZ wants to determine which retail chains were generating the 

m ost calls seeking an explanation of the Reward Points system. The retail 
chain that generated each call is stored in the "Retailer" column of structured 
data in XYZ's customer relationship management database. XYZ created five 
new concepts to identify which retailer is generating a call: 

Retailer.A: occurs whenever the Retailer column of the database - 
"Retailer A"; 

Retailer_B: occurs whenever the Retailer column of the database = 
"Retailer B"; 

RetailerC: occurs whenever the Retailer column of the database = 
"Retailer C"; 
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Retailer_D: occurs whenever the Retailer column of the database = 
"Retailer D"; 

Retailer_E: occurs whenever the Retailer column of the database = 
"Retailer E". 

[0089] XYZ then creates a new concept type called Retailer. 

[0090] XYZ once again performs multi-dimensional analysis on the call 

records (this time focusing on the concept 
Explain_Reward_Points_w/o_Close_Account; the concept type Retailer; 
and all of its associated concepts: Retailer_A, Retailer_B, RetailerC, 
Retailer_D, & Retailer.E). XYZ is able to slice-and-dice the call records 
containing Explain_Re W ard_iPoint S _w/o_Close_Account and thus view 
them by retailer as a report of the resulting analysis (step 804 of FIG. 8). 
Then, using data visualization, XYZ is able to easily see that the vast majority 
(75%) of customer calls which were only related to an explanation of the 
Reward Points system came from Retailer C (step 804 of FIG. 8) as shown in 
FIG. 24. 

[0091] After these iterations with multi-dimensional analysis, it is determined 

that no further concept exists that is relevant to the information needs of this 
analyst at this particular time (step 1 14 to step 1 16 in FIG. 1). 

[0092] Although no further concepts from the initial set of concepts exist, the 

resulting outputs lead the analyst to seek additional objects to create a second 
set of objects as part of fulfilling and completing the information exploration 
and determining/recommending corrective action(s) (step 116 of FIG. 1). If 
new and relevant information objects are found, then the method repeats (step 
118 to step 106 in FIG. 1). For example, upon investigation, XYZ discovers 
that Retailer C was the only retailer that did not have an explanation of the 
Reward Points system on its website or as an automated option on their 
interactive voice response (IVR) system. XYZ can quickly remedy the 
situation, reducing its total call volume and cutting costs. 
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[0093] This example ends when the information needs are satisfied and/or no 

more objects or concepts exist or can reasonably be discovered (step 120 of 
FIG. 1). 

[0094] The resulting exploration of relevant objects and concepts creates at 

least one concept that can be stored in a concept repository (step 902 of FIG. 
9). The initial set of concepts may also be stored in a concept repository 
during refinement. These concept repositories may have read, write and delete 
permission for sharing with other users to permit concept relevancy refinement 
and to identify the information needs boundary within and outside of an 
organization (step 904 of FIG. 9). Users may simultaneously access and refine 
predefined sets of concepts relevant to different goals of an organization such 
as product development, revenue enhancement, cost reduction, competitor 
intelligence, and recruitment. 
[0095] Furthermore, as concepts and objects are refined new business rules 

are created and used in transformation of other new and pre-existing objects or 
data into a datastore (step 1002 of FIG. 10). 

Example Graphical User Interfaces (GUI) of the Present Invention 

[0096] FIG. 11 is an example graphical user interface (GUI) of the present 

invention that illustrates a toolbar 1102 of options for manipulating objects 
within the corpus. The options shown include move or copy objects to another 
corpus or folder, delete objects, and actions that can be manipulated on the 
objects such as edit details, analyze and organize. The underlined filenames to 
the objects are links to the original files and, when selected, will open the file 
in a new browser window 1104. Concepts may be verified and refined by 
examining the original document for context. There is also the ability to 
examine different concept repositories or corpus in the "view" drop down area 
1106, where an expandable file tree is also available. In this example, the text 
files have already been extracted and loaded into the application and are ready 
for analysis. 
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[0097] FIG. 12 illustrates a pie chart 1202 that examines the child concept 

variation for the parent concept "Payment." Also shown in FIG. 12 is a legend 
1204 of the child concept distribution in pie chart 1202. This visualization 
permits a user to refine the child concepts by deleting or adding synonyms to 
particular child concepts that represent a greater proportion of the pie 
(dominant child concepts). 

[0098] After refining parent and child concepts, an analysis of the highest 

frequency parent concepts may be determined. An examination of these 
parent concept frequencies leads to a set of high yield parent concepts that 
should be further analyzed. These concepts are labeled as concept type "p" for 
primary call reason and are illustrated in the GUI of FIG. 13. FIG. 13 
illustrates a "suggested concepts to analyze" menu 1302, a "selected concepts 
to analyze" menu 1304, a frequency of occurrence menu 1306 and a hits menu 
1308. 

[0099] Menu 1302 shows the "p" parent concepts that are suggested concepts 

to analyze. The selected concepts undergoing analysis is shown in menu 
1304. Menu 1306 shows the results of the analysis as "p" parent concepts in 
columns, and the documents where their associated child concepts may be 
found in the rows, with their intersection showing frequency of occurrence. 
Analyzing this output is part of the multi-dimensional analysis of drill down as 
more complex concepts are discovered. For example, in the first row of menu 
1306, "p fee waivers" and "p late fee" occur together for object 36938.txt. 
There may be cause to examine the combination of these concepts as a newly 
discovered concept of "p fee waivers" within 5 words of "p late fee" within 
objects as an example relationship. The underlined objects are links to the 
original object that can be verified for the existence of this complex 
relationship in menu 1308. 

[0100] Concept type "P" represents the possible problems of calls leading to 

business expenses that the organization would like to minimize to improve 
revenue and profit. In general, concepts types relate to aspects or dimensions 
within business processes such as products, components, services, actions 
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taken, processes within and outside the company, symptoms, suppliers, or any 
combination of the above. Concept types primarily clarify an analysis for the 
user. Thus, concept types facilitate knowledge discovery and lead to new 
actionable insights. 

0101] An example of how "p" type concepts are chosen for further analysis is 

shown in a graph 1402 of FIG. 14. The abscissa is the quantified cost of a call 
related to the p parent type derived from the analysis. The ordinate is the p 
parent type representing a ranked list by frequency of occurrence of child 
concepts for their respective parent concepts with the highest occurrence at the 
bottom of the graph. A client/domain expert is involved in examining the 
primary call reasons and their quantified costs. Those reasons unknown to the 
client/domain expert with high business value (cost) are chosen for further 
analysis. In this example, "p payment", "p late fee", "p fee waivers", and "p 
close" were expected to be primary call reasons and as such did not surprise 
the client/domain expert. However, "p inquiry", "p explanation", "p advise" 
and "p verification" were not expected, so a further analysis may be desired. 
[0102] Multi-dimensional analysis may be performed on "p inquiry", "p 

explanation", "p advise" and "p verification" (Exp-Ver-Inq-Adv analysis), as 
shown in a menu 1502 of FIG. 15. Merely by way of example, the first step is 
to create folders using the child concepts for these p parent concepts of menu 
1106 (FIG. 11). This example should not unduly limit the scope of the claims 
herein. One of ordinary skill in the art would recognize many variations, 
alternatives, and modifications. As another example, folders need not be 
created but it may be possible to slice and dice across one concept or concept 
type. The objects are then scored and classified based on the best match using 
standard natural language processing applications/algorithms shown in menu 
1502. Those objects that best matched one p parent concept over another were 
sorted into their respective folders. 
[0103] FIG. 16 shows a menu 1606 of a concept matrix of the multi- 

dimensional analysis used to find relationships among these p parent concepts 
and other p parent concepts. This Exp-Ver-Inq-Adv analysis across other p 
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parent concepts is also shown in a graph 1702 of FIG. 17, where the abscissa 
represents the p parent concepts and the ordinate represents the cost. 

[0104] The previous analyses lead to more complex p parent concept 

combinations as shown in a graph 1802 of FIG. 18. Here, the abscissa is the 
call center cost per year, while the ordinate represents the combined p parent 
concepts based on a given relationship. The circled "p verification p options 
Rl" label represents a non-obvious, complex concept with a high cost that 
makes it a candidate for further analysis. 

[0105] FIG. 19 shows a graph 1902 of a further drill down of the "p 

verification p options Rl" concept from FIG. 18. The abscissa and ordinate 
are the same as in graph 1802 of FIG. 18. The drill down is further stratified 
in various catgories until all non-obvious complex concepts have been 
analyzed to the furthest possible drill down level. Graph 1902 shows that the 
primary reasons for the "p verification p options Rl" concept relate 
predominantly to payment and mail p parent concepts. 

[0106] FIG. 20 shows another menu organization 2002 of the objects by 

clients, where the clients are represented by "Alpha," "Delta," "Epsilon," 
"Gamma," "Omega," and "Theta " Folders are created to hold concepts 
related to these clients, as shown in menu 1 106 (FIG. 11). 

[0107] The most interesting concepts may be analyzed on a client-by-client 

basis by examining their frequency of occurrence. This is illustrated in FIG. 
21 by a menu 2102. In menu 2102, the columns hold the complex compound 
concepts (e.g., "P Verification P Options R2" and "P Verification P Payment P 
Options R3") with a given relationship, while the rows represent the different 
clients (e.g., Alpha, Epsilon and Gamma). The scores can be normalized, be 
represented as a percentage of total calls, or be used with other normalization 
algorithms. 

[0108] The analysis from FIG. 21 may be visualized as part of a 

multidimensional analysis as illustrated in a bar graph 2202 in FIG. 22. This 
graph compares the percent of total client calls received by an individual client 
(e.g., Alpha, Epsilon, and Gamma) for the complex concept "p verification p 
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payment p options" to the total client calls received by all clients for this 
concept. 

[0109] The absolute cost of these calls are shown in graph 2302 of FIG. 23, 

where these same three clients (i.e., Alpha, Epsilon, and Gamma) are 
examined with respect to absolute cost on the ordinate for the complex 
concepts "p verification p payment p options." 

IX. Conclusion 

[0110] Embodiments of the method of the present invention can be performed 

using a computer software system of the type sold by Intelligenxia, Inc. of 
Jacksonville, Florida. The Intelligenxia system is described, in part, in the 
above-referenced U.S. patent applications. Modifications and extensions to 
the Intelligenxia system necessary to implement the present invention will be 
apparent to a person skilled in the art based on the disclosure set forth herein. 

[0111] While exemplary embodiments of the present invention have been 

described above, it should be understood that these embodiments have been 
presented by way of example only, and are not meant to limit the scope of the 
invention. It will be understood by those skilled in the art that various changes 
in form and detail may be made therein without departing from the spirit and 
scope of the invention as defined in the appended claims. Thus, the breadth 
and scope of the present invention should not be limited by the above- 
described exemplary embodiments, but should be defined only in accordance 
with the following claims and their equivalents. Each document cited herein 
is hereby incorporated by reference in its entirety. 



